Modularity and Extreme Edges of the Internet 
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We study the spectral properties of a diffusion process taking place on the Internet network 
focusing on the slowest decaying modes. These modes allow us to identify an underlying modular 
structure of the Internet roughly corresponding to individual countries. For instance in the slowest 
decaying mode the diffusion current flows from Russia towards US military sites. These two regions 
thus constitute the extreme edges of the Internet. Quantitatively the modular structure of the 
Internet manifests itself in approximately 10 times larger participation ratio of its slow decaying 
modes compared to the null model - a random scale-free network. We propose to use the fraction of 
nodes participating in slow decaying modes as a general measure of the modularity of a network. For 
the 100 slowest decaying modes of the Internet we measured this fraction to be around 30%. Finally 
we suggest, that the degree of isolation of an individual module can be assessed by comparing its 
participation in different diffusion modes. Using the proportionality of response as a criterion we 
find that the independent module approximation works well for the Internet. 

PACS numbers: 89.75.-k, 89.20.Hh, 89.75.Hc, 05.40.Fb 



Virtually any complex system has an underlying net- 
work that defines the backbone of interactions among 
its components. Examples of such networks include the 
Internet and the World Wide Web, molecular networks 
of living cells, food webs in ecosystems, etc. An impor- 
tant question is whether nodes of such a network can 
be divided into smaller sub-networks (modules), which 
interact with each other relatively weakly [Q. Estimat- 
ing the strength of inter-modular interactions, localiz- 
ing crucial links connecting these modules to each other, 
and finding pairs of modules which are the most distant 
from each other is important for several reasons. First 
of all, it serves as a test of stability of the system with 
respect to breaking it up into truly isolated components. 
Such a break-up would be undesirable in, for example, 
the Internet. Creation of extra connections between the 
most distant modules in the network and reinforcement 
of crucial links is an efficient way to increase its stabil- 
ity. Secondly, by measuring the relative strength of inter- 
and intra-modular connections one directly assesses the 
quality of the independent module approximation, which 
may turn out to be important in modeling the actual 
dynamics of a given complex system. 

In this work we explore the modular structure present 
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in the physical layout of the Internet. To this end we 
study an auxiliary diffusion process taking place on this 
network. The slowest modes of diffusion, easily iden- 
tifiable from the spectrum of its transfer matrix, allow 
us to detect the weakly interacting modules of the In- 
ternet. These modules turn out to roughly correspond 
to individual countries or for large countries to cultural 
or geographical regions within the country. Of course, 
the diffusion process studied in this work does not de- 
scribe the real dynamics of the information flow over the 
Internet. However, the detected modular features play 
an important role in any local dynamical process taking 
place on this network including the real Internet traffic. 

Analysis of spectral properties of a similar diffusion 
process lies at the heart of the popular search engine 
www.google.com |^]. Its variants have also been ap- 
plied to social networks (the correspondence analysis) [|| , 
random and small-world networks (the Laplace equation 
analysis) Q|, artificial scale-free networks, |€| and the 
community structure of the World Wide Web | ^ . 

In this work we explore the physical layout of the Inter- 
net on a coarse-grained level of the so-called Autonomous 
Systems (AS), which are large groups of routers and 
servers belonging to one organization such as a univer- 
sity or a business enterprise (e.g. an Internet Service 
Provider). To this end we use the January 3, 2000 dataset 
when the Internet consisted of 6474 Autonomous Systems 
exchanging information via 12572 undirected links § . As 
expected for the Internet, any pair of Autonomous Sys- 
tems is connected to each other by at least one path, so 
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that topologically the network consists of just one large 
cluster. The diffusion process we analyze here describes 
the dynamics of a large number of random walkers mov- 
ing on the network at discrete time steps. Statistical 
properties of returns to the origin of such random walks 
have recently been used to measure the effective dimen- 
sionality of several complex networks px| ] . At each time- 
step every walker moves from its current node to one of 
the neighboring nodes along a randomly selected link. 
The average dynamics of this process is described by 

p^{t + l) = J2 T^.p-Ai) , (1) 
j 

where pi (t) is the expectation value of the number of ran- 
dom walkers at site i and time t. The elements Ttj of the 
transfer matrix are equal to 1/Kj for neighboring nodes 
i and j and zero otherwise. Here Kj is the connectivity 
(the number of immediate neighbors) of the node j from 
which a walker steps to the node i. Note that J^i "^ij — ^^ 
so that the total number of walkers is conserved at all 
times. Eq. ([|) can also be rewritten as a discrete time 
diffusion equation -1-1) = C^ij-^v) Pjit)- 
Hence the diffusion matrix D is related to the transfer 
matrix T simply as 

D = T - 1 . (2) 

As time advances the distribution of random walkers ap- 
proaches a steady state pi(oo) in which the diffusion cur- 
rent flowing from a node i to a node j is exactly balanced 
by that flowing from j to i. This is satisfied when the 
average number of walkers pi(oo) on every node i is pro- 
portional to its connectivity Ki. 

The relaxation of any initial distribution of random 
walkers among nodes, Pi(0), towards the steady state 
configuration pi(oo) is determined by the spectral prop- 
erties of the matrix T (or alternatively D). For 
instance, the steady state configuration pi{oo) itself is 
proportional to the principal eigenvector p^^'' of T cor- 
responding to its largest eigenvalue A*^^^ — 1, which is 
unique for single component networks such as the Inter- 
net. The remaining eigenvectors p^"' describe the de- 
cay of the initial configuration towards the steady state 
with a characteristic decay time r*^"' related to the corre- 
sponding eigenvalue A*^"-' through cxp(— 1/t^"^) — |A'^"-'|. 
Note that in general there exist both non-oscillatory 
(^(a) ~ 1) and oscillatory (A'"-* ~ —1) slowly decaying 
modes. 

The modularity of a given complex network reflects it- 
self in statistical properties of its diffusion eigenvectors 
p^"^ . One such property is the Participation Ratio (PR) , 
which quantifies the effective number of nodes partici- 
pating in a given eigenvector with a significant weight. 
In the Internet the components of the principal eigen- 
vector p[^^ (X pi{oo) oc Ki as well as those of other slow 
decaying modes are broadly distributed (scale-free) 
and as such tend to be localized on just a few highly con- 
nected nodes. In this case participation ratios are best 



calculated using the normalized eigenvector 

cf^ = P^^/K. (3) 

of outgoing currents flowing from node i along each of its 
links. More formally Cj-"-* is also the eigenvector of the 
transposed transfer matrix with the same eigenvalue 
A("). For such a vector normalized by ^ cf — 1 the 

participation ratio is defined as: PR — (^J2iLi '^t^ 

In Fig. ^ the participation ratio of eigenvectors c["^ 
(top) and the eigenvalue density (i.e. the spectrum of 
the matrix) (bottom) is plotted as a function of the cor- 
responding eigenvalue — 1 < A^"^ < 1. The data for the 
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FIG. 1: The participation ratio FR'-"^ (top. A) and the eigen- 
value density (bottom, B) as a function of the eigenvalue 
— 1 < A'°' < 1 measured in the Internet (filled circles) and in 
its randomized counterpart (open squares) - a Random Scale- 
Free Network (RSFN). The participation ratio was averaged 
over A-bins of size 0.05 excluding eigenmodes A'"' = [g_5[, 
and A'^' — 1. Notice that for |A| ~ 1 participation ratios in 
the Internet significantly exceed those in an RSFN indicating 
the modular character of the former network. 

Internet (filled circles) is displayed together with the data 
for its randomized counterpart (open squares). The ran- 
domization of the Internet was performed in such a way 
that the connectivity of every node is strictly preserved 
p3[ . It was argued ^ that such a network consti- 
tutes a proper null model of a given complex network. 
Since this random network has the same scale-free dis- 
tribution of connectivities as the Internet |]l2j it will be 
referred to as a Random Scale- Free Network (RSFN). 

Comparing the data for the Internet and an RSFN 
we note that while the density of states is rather sim- 
ilar in these two networks (Fig. |l| B ), the participa- 
tion ratio of the slowly decaying modes (especially for 
the non-oscillatory ones with A close to 1) is markedly 
higher in the Internet than in an RSFN (Fig. |l] A). In 
these non-oscillatory modes the diffusion current flows 
from relatively isolated regions (modules) along the few 
links connecting them to the rest of the network. If for 
such a module these links would be hypothetically cut 
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one by one, the corresponding eigenvalue would gradu- 
ally increase towards unity, A*^"-' — > 1, while the eigen- 
vector would become more and more localized on the 
module. When finally the module is completely discon- 
nected fi'om the network the eigenvector has evolved to 
the steady state solution on the module, which has the 
participation ratio equal to its size. Thus the PR of 
slowly decaying eigenmodes serves as a good quantita- 
tive estimate of the size of modules in the network. In 
an RSFN these modules are small consisting of just a 
handful of nodes that accidentally happen to be loosely 
connected to the rest of the network. The fact that the 
participation ratios of slow decaying modes on the Inter- 
net significantly exceed those in an RSFN indicates that 
the corresponding modules are real and not accidental. 
The average participation ratio of slowly decaying modes 

can be quantified by Ea PR,^"-* 1-^^"' I''/ Ea lA^"^'- For 
5 < k < 10 this average changes only slowly in both the 
Internet and an RSFN and equals approximately 60 and 
5, respectively. A rough estimate for the number of dif- 
ferent modules is given by the number of slowly decaying 
non-oscillatory states in Fig. l| A that have a participa- 
tion ratio significantly exceeding that of an RSFN. For 
the Internet the number is around 100. The sum of the 
participation ratios for these first 100 modes, ~ 5400 , 
is a rough estimate of the total number of nodes in the 
modular part of the network. This should be compared 
to the same sum being approximately equal to 520 in an 
RSFN. If one takes special care to avoid double count- 
ing nodes that appear more than once among the set of 
PR^"^ nodes with the largest \cf'^ \ taken for each eigen- 
mode 1 < a < 100, this number gets reduced to ~ 1800. 
Thus the overall modularity of the Internet network is at 
least 1800/6500 ~ 30%. 

To determine the organizing principle behind these In- 
ternet modules in Fig. ^ we plot the outgoing current cp' 
in the slowest decaying diffusion mode (A^^^ = 0.9626) as 
a function of the AS number (note that some AS num- 
bers are not yet in use). Autonomous Systems known 
to be located in Russia are marked with a circle. The 
PR for this eigenmode is 107, while the total number 
of Russian AS in our dataset is 174. In Fig. |^ one can 
see that almost all the Autonomous Systems that signif- 
icantly participate in this mode (large positive c\ ) are 
Russian. We have checked that the few exceptions to this 
rule are in fact Autonomous Systems closely related to 
Russia. Thus in the slowest decaying mode the diffusion 
current flows from a module that may be identified with 
Russia towards the rest of the Internet. Curiously enough 
the set of Autonomous Systems furthest away from Rus- 
sia (the most negative c\ ) are located in the US and 
belong to the US Military. This possible legacy of the 
cold war makes Russia and the US Military the extreme 
edges of the Internet. Performing a similar analysis for 
other slowly decaying modes we get a similar picture, 
just with other pairs of countries being pulled out. For 
the Internet the modules thus correspond to individual 
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FIG. 2: Components ci of the slowest decaying diffusion 
mode in the Internet (eigenvalue A'^' = 0.9626) as a func- 
tion of the AS number. The AS known to be geographically 
located in Russia are marked with circles. The scale of the 
negative part of the y-axis is increased for clarity. Out of 
100 Autonomous Systems with the most negative components 

(2) 

cl , those 23 for which we were able to find the description 
are associated with the US Military. 



countries, or for large countries - to organizational or 
geographical features within the country. 

It is interesting to note that these country-modules 
cannot be detected using the spectral analysis of the ad- 
jacency matrix of the network [|[ ^, 1). The elements 
of this matrix, closely related to T, are equal to 1 for a 
pair of neighboring nodes and otherwise. The largest 
eigenvectors of the adjacency matrix are known to be lo- 
calized primarily on the highest connected hubs and their 
neighbors H, ^. However, unlike in the case of T, this 
undesirable localization cannot be properly eliminated 
simply by dividing the components of the eigenvectors 
by the connectivity Ki. Hence eigenvectors of the adja- 
cency matrix do not properly reflect the country-based 
modular structure uncovered in this work. 

Having established that the Internet is indeed modu- 
lar we now address the question of how good these in- 
dividual modules are. To this end we compare different 
eigenmodes cf^ to each other. Although the primary 

feature in a slowly decaying eigenmode c,-"^ is the flow 
between a dominant pair of country-modules (such as 
between Russia and the US Military in c] ) , other mod- 
ules may also participate in it but to a smaller extent. 
This gives rise to a fine structure within slow decaying 
modes that is not captured by the participation ratio. 
The hallmark of a good module is that even though it 
participates in different eigenmodes a to a different ex- 
tent, the relative distribution of a-currents within the 
module stays approximately the same. In other words 
it enters different eigenmodes as just one degree of free- 
dom. In this case the ratio c["'/c^"'' is approximately 
independent of the eigenmode a for any pair of nodes i 
and j within the module. This is equivalent to the con- 
dition that for any two different eigenmodes a and /?, 
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cf /cf = const for every node i belonging to the given 
module. In Fig. p| we plot the outgoing currents in the 

(2) 

two slowest decaying non-oscillatory eigenmodes - , 

(3) 

and Cj - as a function of each other. Similar plots can 
be made for other pairs of slow decaying eigenmodes. The 
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FIG. 3: The Internet clustering: Coordinate of the i-th AS 
in this plot are its components {c'P ,cf'^) in the two slowest 
decaying non-oscillatory diffusion modes. The color code re- 
veals the geographical location of the AS: Russia - red squares, 
France - green circles, USA - blue crosses, Korea - orange 
triangles). Note the straight lines corresponding to good 
country- modules. 

principal feature in this kind of plot is a star like shape, 
where different rays of the star correspond to individual 
country-modules. This type of plot is more powerful in 
identifying individual modules than the participation ra- 
tio alone. Indeed, in Fig. || one can easily detect not 
only the most excited modules like Russia, France, and 
US (red squares, green circles, and blue crosses), but also 
less excited ones like Korea (orange triangles). We be- 
lieve that the idea of measuring the quality of individual 
modules by how proportionally their nodes participate 
in different slowly decaying modes, can be easily gener- 
alized to other dynamical processes taking place on the 



network such as e.g. spin dynamics, vibrational modes, 
etc. 

Finally, we would like to point out another interest- 
ing feature of Fig. ||. Both the density of states and the 
participation ratio are nearly symmetric around A = 
for both the Internet and an RSFN. This near symmetry 
indicates that both these networks are almost bipartite 
[]l6| , a feature also observed in citation networks, but 
not in metabolic networks [|lOj. In fact, a more detailed 
analysis shows that while almost every slow oscillatory 
mode (A'°'"^-' ~ —1) is related to the corresponding non- 
oscillatory mode with A ~ |A'^°'^'^-' |, the reverse is not true 
as there are roughly 30% more modes near A = 1 than 
near A = — 1. Those country-modules that are present 
in both the oscillatory and non-oscillatory parts of the 
spectrum are internally almost bipartite. The simplest 
bipartite graph is a tree and this seems to be the domi- 
nant structure within the Internet modules. 

In summa, we have demonstrated how a diffusion pro- 
cess taking place on the Internet network allows one 
to extract information about its modules and extreme 
edges. For many "real-world" complex networks the lo- 
cal context of a node (in terms of the linkage pattern) re- 
flects, or, perhaps, even determines the importance and 
function of the given node. For instance, in biology one 
can successfully assign putative functions to unclassified 
proteins based on the function of their interaction part- 
ners fl^ . In general the diffusion process introduced in 
this work can be seen as a systematic way to explore 
the local linkage structure of a network beyond just the 
nearest neighbors. The detection of the modular struc- 
ture of a network is just one possible application of such 
a process p^ . 
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of this work was completed. 
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